## Lightweight and Secure PUF Key Storage Using Limits of Machine Learning



Meng-Day (Mandel) Yu<sup>1</sup>, David M'Raïhi<sup>1</sup>, Richard Sowell<sup>1</sup>, Srinivas Devadas<sup>2</sup>

<sup>1</sup>Verayo, Inc., San Jose, CA, USA <sup>2</sup>MIT, Cambridge, MA, USA



## Agenda

## Physical Unclonable Function (PUF) Overview

#### **PUF Noise Profile**

• Response size, temperature, voltage

#### **Deriving Stable PUF Bits**

- Traditional: Large block ECC, Two-stage ECC
- Lightweight: Stable bits w/o complex ECC

#### **Security Framework**

• "Use what cannot be learned about the system"

#### Conclusions



# PUF Overview



## **Physical Unclonable Functions (PUF)**



Tiny electronic circuits extract silicon *manufacturing variations* 

Unique characteristics = "silicon biometrics"

PUF responses are "noisy"

To generate <u>Stable PUF Bits</u>: add error correction algorithm





# PUF Noise Profile



## **PUF Noise Profile**





## Deriving Stable PUF Bits



## **Methods to Derive Stable Bits**

#### Large Block ECC

- Single stage error correction
  - BCH(255,63,t=30) [Suh-MIT2005]
  - BCH(255,13,t=59) [AMSSW-IEEE\_S&P2011]

#### **Two-stage ECC**

VERAYO

- Quadratic reduction in complexity
  - Repetition(11,1,t=5) + Golay(24,13,t=3) [BGSST-CHES2008]
  - Repetition(11,1,t=5) + RM(16,5,t=3) [BGSST-CHES2008]
  - Repetition<sub>SoftDecision</sub>(3,1,t=1) + RM<sub>SoftDecision</sub>(64,22,t=7) [MTV-CHES2009]
  - IBS + BCH(63,30,t=6) [YD-IEEE\_D&T2010]

#### Lightweight (no complex ECC)

- Use "Index Based Syndrome" (IBS) w/o BCH
- Additional complexity reduction (75%)
- Add retry, simple coding to improve reliability

## Index-Based Syndrome (IBS) Coding

From [YD-IEEE\_D&T2010]

Use a group of PUF output values to store a bit sequence Simple case: a sequence of 1 bit

- Encoder:
  - If  $\underline{B} = 1$ ,  $\underline{S} = \text{index of } f_1(\underline{R}_0 = r_0, \dots, \underline{R}_{q-1} = r_{q-1})$
  - If  $\underline{B} = 0$ ,  $\underline{S} = \text{index of } f_0(\underline{R}_0 = r_0, \dots, \underline{R}_{q-1} = r_{q-1})$

Let  $f_1 = \max$  function,  $f_0 = \min$  function <u>*B*</u> = bit to store, <u>*S*</u> = Syndrome Word

- Decoder:
  - $\underline{B'} = \text{sign}_{of} (\underline{R}_{s})$

#### Advantages:

VERAYO

- Trivially simple encoder and decoder
- High coding gain -> reduction in ECC complexity
- Provably secure (more later)

## Size Comparisons (Xilinx Virtex-5 LX50)

| Lightweight<br>(IBS)              | 2-stage ECC<br>(IBS + BCH63)     | Large Block<br>(BCH255)            |
|-----------------------------------|----------------------------------|------------------------------------|
| 69 registers                      | 471 registers                    | 6400 registers<br>(est. using 16x) |
| ~1.2% SLICE<br>count<br>(99/7200) | ~5% SLICE<br>count<br>(393/7200) | ~65% SLICE<br>count                |

- Includes decoder + encoder
- Does not include APB interface, I/O buffering
- Even smaller if test logic, configurability removed





## **Decoder Core Comparisons (Xilinx Spartan 3E-500)**

Retargeted implementation for Spartan 3E (w/o optimizations) for comparison

Use best results from [MTV-CHES2009], [BGSST-CHES2008]

Goal: 128-bit key

|                | Current Work                                                             |                                                                   | [MTV-CHES2009]                                                                                 | [BGSST-CHES2008]<br>PUF-Optimized                                                           | [BGSST-CHES2008]<br>Decoder-Optimized |  |
|----------------|--------------------------------------------------------------------------|-------------------------------------------------------------------|------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|---------------------------------------|--|
| Area           | 116 SLICES<br>(no μcode required)                                        |                                                                   | 164 SLICES<br>(μcode ROM required)                                                             | 580 SLICES<br>(μcode ROM required)                                                          | 110 SLICES<br>(μcode ROM required)    |  |
| Dec<br>Cycles  | ~16640 cycles<br>(@ 100Mhz+)                                             |                                                                   | ~10298 cycles<br>(@ 50.2Mhz)                                                                   | >= 24024 cycles<br>(@ 151.5Mhz)                                                             | >= 29925 cycles<br>(@ 175.4Mhz)       |  |
| Helper<br>Data | 780 bit                                                                  |                                                                   | 13952 bit                                                                                      | 3824 bit                                                                                    | 6288 bit                              |  |
| PUF Size       | 1280 OSC*<br>(Security-<br>Optimized)                                    | 256 OSC*<br>(PUF-<br>Optimized)                                   | 1536 bit SRAM**                                                                                | 3696 bit SRAM**                                                                             | 6160 bit SRAM**                       |  |
| Stability      | -55°C to 125°C, V <sub>nom</sub> +/- 10%,<br>WC VT Corners, Aging        |                                                                   | PUF noise model does not account for V,T.                                                      | PUF noise model accounts for -20°C to 80°C<br>[GSST-CHES2007]. No voltage.                  |                                       |  |
| Security       | Insufficient<br>equations to<br>learn system<br>(no i.i.d<br>assumption) | Use<br>unlearnable<br>part of system<br>(no i.i.d.<br>assumption) | Soft decision<br>information is<br>information-<br>theoretically secure<br>(i.i.d. assumption) | No explicit security argument to account for<br>leaks associated w/ heavy repetition coding |                                       |  |

\* 5 inversions per OSC (~3 NAND2 equivalent gate, 1<sup>st</sup> order est.) \*\* 6T cell per bit (~3 NAND2 equivalent gate)



## WC Voltage / Temperature Corners

Empirical PUF data from Xilinx Virtex-5 FPGAs Error Free Performance using 4-bit Index

- 1M+ blocks, implied failure rate < 1 ppm
- SS Corner 125°C, 0.9V
- FF Corner -55°C, 1.1V



## **Accelerated Aging**

~90M+ blocks, error free performance, 4.25 bit Index

- Implied error rate <= 12 parts per <u>billion</u> (ppb)
- Accelerated age: 20+ yrs @ 55°C
- Provisioning: 25°C, 1.0V; Regeneration: 125°C, 1.10V

Aging deteriorates silicon, increasing Indexing requirement by 1/4 bit





### **Voltage Testing, ASIC**

PUF + Indexing Algorithm in .13µm silicon

• 4 to 5 bit Index for reliable (ppm level or better) operation

#### **Results consistent with FPGA**



## Security Framework



## **Security Dependencies of Prior Work (1)**

Recall: [BGSST-CHES2008]

- No explicit security argument for use of Repetition[11,1,t=5] code
- Heavy repetition coding highly sensitive to PUF bias:

**Bits leaked per repetition-coded bit =**  
[YD-IEEE\_D&T2010] 
$$\frac{|PUFbias - 0.5|}{|repetition/2|} - 0.5|$$

1. if PUF bias = .55, all bits leaked!

2. if PUF bias = .505, 1 bit leaked out of every 9 bits repetition-coded

... this is true even if PUF output bits are assumed to be i.i.d.

#### Current work avoids heavy repetition coding



## **Security Dependencies of Prior Work (2)**

[MTV-CHES2009] and [YD-IEEE\_D&T2010] both use proofs that require i.i.d. PUF output assumption (implicitly or explicitly)

Questions:

- Memory PUF: Are there correlations based on memory word columns?
- Arbiter PUF / OSC PUF: Are there correlations with reuse of delay elements?

Can we remove i.i.d. assumption?



## **Unconditional Security**

#### **Recall:**

- Shannon Entropy:  $H(\underline{X}) = -p(x) \log_2 p(x)$
- Mutual Information:  $I(\underline{Y}; \underline{X}) = H(\underline{Y}) H(\underline{Y} | \underline{X})$

#### Unconditional security (perfect secrecy) [Shannon, 1949]

- Ciphertext share no information with the Key
- Secure against a *computationally-<u>un</u>bounded* adversary
- <u>Strongest form of security</u>
- Information shared between Ciphertext and Key:
  - $I(\underline{CT}^{alg}; \underline{Key}) = H(\underline{Key}) H(\underline{Key} | \underline{CT}^{alg})$

## We adapt this unconditional security measure to develop a syndrome leakage metric...



## Leaked Bits (LB)

What is the information shared between a Syndrome Word and a perfect model of the PUF?

- Code offset [Dodis, 2004], 3x repetition coding
  - $LB(\underline{S}^{3x}) \equiv I(\underline{S}^{3x}; \underline{M}^{\infty}) = H(\underline{S}^{3x}) H(\underline{S}^{3x} | \underline{M}^{\infty}) = 3 1 = 2 \text{ bits}$
- Index-Based Syndrome (IBS) Coding [Yu, 2010], 3-bit index
  - $LB(\underline{S}^{3i}) \equiv I(\underline{S}^{3i}; \underline{M}^{\circ}) = H(\underline{S}^{3i}) H(\underline{S}^{3i} | \underline{M}^{\circ}) = 3 1 = 2$  bits
- Can we leak less information?



## Syndrome Distribution Shaping (SDS): Intuition

IBS: pick most + or most - value, to encode a "1" bit or a "0" bit



Cryptographic Hardware and Embedded Systems (CHES) 2011, Nara, Japan

ERAYO

## Syndrome Distribution Shaping (SDS)

Let *p* = probability a PUF output choice is ignored or skipped

• i.e., the max or min selection ignores that PUF output choice

#### Reducing Leaked Bits while preserving error correction power:

•  $I(\underline{S}^{3i}, \underline{M}^{\infty}) = 2$  bits

- $I(\underline{S}^{W=5, p=3/4}, \underline{M}^{\circ}) = 0.80 \text{ bits}$
- $I(\underline{S}^{W=6, p = 7/8}, \underline{M}^{\infty}) = 0.71 \text{ bits}$
- $I(\underline{S}^{W=7, p = 15/16}, \underline{M}^{\infty}) = 0.67 \text{ bits}$

"Choosing best out of 8" "Choosing best out of 16, w/ ~half of the choices eliminated"

Leaked Bits 42x!



## **Machine Learning Results**

Ruhrmair, et. al., "Modeling Attacks on PUFs", ACM CCS 2010.

$$N_{CRP} \approx 0.5 \frac{k+1}{\varepsilon}$$

N<sub>CRP</sub> : number of challenge / response pairs k: # of delay parameters in an arbiter PUF ε: classification error

Observation: Adversary with k C/R pairs cannot do much better than guessing, i.e.,  $\epsilon \approx 0.5$ .



## What *cannot* be learned? (1)

Now rearrange the equation, rename terms, etc.

$$N_{CRP} \approx 0.5 \frac{k+1}{\varepsilon}$$
  $\varepsilon \approx \min(0.5 \frac{k+1}{\Sigma LB}, 0.5)$ 

Conservative: stay <u>safely within</u> boundary where  $\varepsilon = 0.5$  such that virtually <u>nothing</u> is learned from Syndrome Bits.

When  $0 \le \epsilon \le 0.5$ , <u>something</u> is learned from the Syndrome Bits.

But how much information *cannot* be learned?



## What <u>cannot</u> be learned? (2)

We know "bias" reduces min-entropy  $H_{\rm m}$ 

## Now, instead of "bias", we have $\epsilon$ <u>conditioned upon</u> Syndrome Words known to the adversary at some point in time

-> Use conditional version of min-entropy

 $H_{\infty}(\underline{X} \mid \underline{Y}) \equiv -\log_2 \left( E_{y \leftarrow \underline{Y}} [2^{-H_{\infty}(\underline{X} \mid \underline{Y} = y)}] \right)$  [Dodis, "Fuzzy Extractor", 2004]  $\text{where: } H_{\infty} \equiv -\log_2 \left( \text{pr}_{\max}(.) \right)$ 

we note this applies to a bit-oriented learner as well as a block-oriented learner

- What the adversary can learn is reflected in classification error  $\boldsymbol{\epsilon}$
- $H_{\infty}$  computes the amount of secrecy left in the system using  $\epsilon$
- $H_{\infty}$  reflects the min-entropy that the ML-adversary cannot "touch" or learn



## **Security Sketch**

VERAYO



## Now, without i.i.d. PUF output assumption...

#### **Secure Construction #1:**

- 640 OSC pairs, forming k = 640 "delay parameters"
- Only k / 2 equations (each with a 1 bit outcome) leaked via Syndrome
- 320 degrees of freedom to keep secret 128-bits of information

"insufficient equations to learn system"

#### **Secure Construction #4:**

- 128 OSC pairs, forming k = 128 "delay parameters"
- Use ε curve to compute amount of secrecy left in the system
- Can extract a 128-bit key using results from [Ruhrmair, 2010]
  - Secure against a (ε,ΣLB) Machine-Learning adversary

#### "use unlearnable part of system"

# Conclusions



## Conclusions

#### **Lightweight PUF Key Generation**

- 75% reduction in complexity from 2-stage ECC
  - Two stage ECC better characterized and also available
- Environmentally Stable
  - Temperature: -55°C to 125°C
  - Voltage: V<sub>nom</sub> +/- 10%
  - WC VT Corners
  - Aging: 20+ yrs @ 55°C
  - Error free, 90M+ tests, FPGA, ASIC

#### **Security Framework**

- Security-Optimized: "Insufficient eqs to learn sys"
- PUF-Optimized: "Use unlearnable part of system"

#### **Future Work**

• De-rate results to account for side channel information

#### THANK YOU!

Implied error rate <= 12 parts per *billion* 

No i.i.d. PUF output assumption



## Extras:

## PUF Randomness / Uniqueness



## **NIST Randomness Test Results**

Verayo PUF Output bits are "random" based on NIST testing

- Low bias (< +/- 1%)</li>
- Results affirmed using on other statistical testing methods

| Statistical Test       | Block/Template<br>Length | Success ratio<br>(chip #100) | Success ratio<br>(chip #101) | Success ratio<br>(chip #102) | Success ratio<br>(chip #103) | Reference<br>bitstream <sup>1</sup> |
|------------------------|--------------------------|------------------------------|------------------------------|------------------------------|------------------------------|-------------------------------------|
| Frequency              | -                        | 99%                          | 99%                          | 98%                          | 99%                          | 98%                                 |
| BlockFrequency         | 128                      | 100%                         | 100%                         | 99%                          | 99%                          | 97%                                 |
| CumulativeSums         | -                        | 99% - 99%                    | 99% - 100%                   | 97% - 98%                    | 99% - 99%                    | 98% - 99%                           |
| Runs                   | -                        | 97%                          | 99%                          | 100%                         | 99%                          | 100%                                |
| LongestRun             | -                        | 100%                         | 100%                         | 99%                          | 99%                          | 97%                                 |
| Rank                   | -                        | 100%                         | 98%                          | 100%                         | 100%                         | 100%                                |
| FFT                    | -                        | 100%                         | 100%                         | 100%                         | 100%                         | 100%                                |
| NonOverlappingTemplate | 9                        | 94% - 100%                   | 95% - 100%                   | 95% - 100%                   | 95% - 100%                   | 95% - 100%                          |
| Overlapping Template   | 9                        | 98%                          | 98%                          | 99%                          | 98%                          | 97%                                 |
| Universal              | -                        | 97%                          | 98%                          | 100%                         | 96%                          | 100%                                |
| ApproximateEntropy     | 10                       | 100%                         | 99%                          | 99%                          | 99%                          | 100%                                |
| RandomExcursions       | -                        | 98%-100%                     | 97% - 100%                   | 97% - 100%                   | 98% - 100%                   | 98% - 100%                          |
| RandomExcusionVariant  | -                        | 97% - 100%                   | 97% - 100%                   | 97% - 100%                   | 96% - 100%                   | 93% - 100%                          |
| Serial                 | 16                       | 99% - 99%                    | 99% - 100%                   | 99% - 100%                   | 98% - 98%                    | 98% - 100%                          |
| LinearComplexity       | 500                      | 100%                         | 99%                          | 99%                          | 99%                          | 100%                                |
|                        |                          |                              |                              |                              |                              |                                     |
| Cumulative p-values    |                          | 100% (188/188) pass                 |
| Cumulative proportions |                          | 99% (187/188) pass           | 99% (187/188) pass           | 99% (187/188) pass           | 99% (187/188) pass           | 98% (184/188) pass                  |

<sup>1</sup> From George Marsaglia's Random Number CDROM.



## **Uniqueness Analysis**

ERAYO

240 PUF devices (12 FPGAs, 20 PUFs each)

